Search CORE

69 research outputs found

Statistical Knowledge and Learning in Phonology

Author: Dunbar Ewan
Publication venue
Publication date: 01/01/2013
Field of study

This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar

Digital Repository at the University of Maryland

Learning weakly supervised multimodal phoneme embeddings

Author: Chaabouni Rahma
Dunbar Ewan
Dupoux Emmanuel
Zeghidour Neil
Publication venue
Publication date: 01/01/2017
Field of study

Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonological features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

RNNs Implicitly Implement Tensor Product Representations

Author: Dunbar Ewan
Linzen Tal
McCoy R. Thomas
Smolensky Paul
Publication venue
Publication date: 05/03/2019
Field of study

Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively combine tensor products of vectors representing roles (e.g., sequence positions) and vectors representing fillers (e.g., particular words). To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations. We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag of words, with only marginal improvements from more sophisticated structures. We conclude that TPDNs provide a powerful method for interpreting vector representations, and that standard RNNs can induce compositional sequence representations that are remarkably well approximated by TPRs; at the same time, existing training tasks for sentence representation learning may not be sufficient for inducing robust structural representations.Comment: Accepted to ICLR 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

Extracting binary features from speech production errors and perceptual confusions using Redundancy-Corrected Transmission

Author: Dunbar Ewan
Fu Zhanao
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/06/2023
Field of study

We develop a mutual information-based feature extraction method and apply it to English speech production and perception error data. The extracted features show different phoneme groupings than conventional phonological features, especially in the place features. We evaluate how well the extracted features can define natural classes to account for English phonological patterns. The features extracted from production errors had performance close to conventional phonological features, while the features extracted from perception errors performed worse. The study shows that featural information can be extracted from underused sources of data such as confusion matrices of production and perception errors, and the results suggest that phonological patterning is more closely related to natural production errors than to perception errors in noisy speech

ScholarWorks@UMass Amherst

Comparing unsupervised speech learning directly to human performance in speech perception

Author: Dunbar Ewan
Jurov Nika
Millet Juliette
Publication venue: HAL CCSD
Publication date: 24/07/2019
Field of study

International audienceWe compare the performance of humans (English and French listeners) versus an unsupervised speech model in a perception experiment (ABX discrimination task). Although the ABX task has been used for acoustic model evaluation in previous research, the results have not, until now, been compared directly with human behaviour in an experiment. We show that a standard, well-performing model (DPGMM) has better accuracy at predicting human responses than the acoustic baseline. The model also shows a native language effect, better resembling native listeners of the language on which it was trained. However, the native language effect shown by the models is different than the one shown by the human listeners, and, notably , the models do not show the same overall patterns of vowel confusions

Analogies minus analogy test: measuring regularities in word embeddings

Author: Dunbar Ewan
Dupoux Emmanuel
Fournier Louis
Publication venue
Publication date: 01/01/2020
Field of study

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France--London, China--Ottawa, ...) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Mouse tracking as a window into decision making

Author: Chemla Emmanuel
Dunbar Ewan
Maldonado Mora
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2019
Field of study

International audienceMouse tracking promises to be an efficient method to investigate the dynamics of cognitive processes: It is easier to deploy than eyetracking, yet in principle it is much more fine-grained than looking at response times. We investigated these claimed benefits directly, asking how the features of decision processes—notably, decision changes—might be captured in mouse movements. We ran two experiments, one in which we explicitly manipulated whether our stimuli triggered a flip in decision, and one in which we replicated more ecological, classical mouse-tracking results on linguistic negation (Dale & Duran, Cognitive Science, 35, 983–996, 2011). We concluded, first, that spatial information (mouse path) is more important than temporal information (speed and acceleration) for detecting decision changes, and we offer a comparison of the sensitivities of various typical measures used in analyses of mouse tracking (area under the trajectory curve, direction flips, etc.). We do so using an “optimal” analysis of our data (a linear discriminant analysis explicitly trained to classify trajectories) and see what type of data (position, speed, or acceleration) it capitalizes on. We also quantify how its results compare with those based on more standard measures

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Hal-Diderot

Recommended from our members

Tensor Product Decomposition Networks: Uncovering Representations of Structure Learned by Neural Networks

Author: Dunbar Ewan
Linzen Tal
McCoy Richard T
Smolensky Paul
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

We introduce an analysis technique for understanding compositional structure present in the vector representations used by neural networks. The inner workings of neural networks are notoriously difficult to understand, and in particular it is far from clear how they manage to perform remarkably well on tasks that depend on compositional structure even though they use continuous vector representations with no obvious compositional structure. Using our analysis technique, we show that the representations of these models can be closely approximated by Tensor Product Representations, a type of interpretable structure that lends significant insight into the workings of these hard-to-interpret models

ScholarWorks@UMass Amherst

The Zero Resource Speech Challenge 2017

Author: Anguera Xavier
Benjumea Juan
Bernard Mathieu
Besacier Laurent
Cao Xuan Nga
Dunbar Ewan
Dupoux Emmanuel
Karadayi Julien
Publication venue
Publication date: 12/12/2017
Field of study

We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japa

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server